Recall that the residuals in the first model attempt were rather structured, indicating a major issue in the model of unknown source. In order to address for this structure and to improve the general model fit in the process, several model modifications are proposed. These include the transformation of the target functions, adding more PCs of the climate variables, adding non-linear effects and checking for possible interactions.
As a reminder, the summary output of the initial model is given here:
## Response PC1 :
##
## Call:
## lm(formula = PC1 ~ Lon + Lat + sand_fraction + silt_fraction +
## bulkdensity_soil + ph_soil + soilcarbon + time_since_dist +
## Scenario + initial_recruitment_BNE + initial_recruitment_IBS +
## initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC +
## recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS +
## recruitment_ten_years_Tundra + recruitment_ten_years_otherC +
## previous_state_BNE + previous_state_IBS + previous_state_TeBS +
## previous_state_Tundra + previous_state_otherC + PC1_temp +
## PC1_precip, data = d_train %>% select(-c(clay_fraction)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6863 -1.9650 -0.0036 1.9651 7.4673
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.199e+01 1.962e+00 -11.209 < 2e-16 ***
## Lon -6.785e-03 8.459e-04 -8.021 2.19e-15 ***
## Lat 2.605e-01 1.631e-02 15.979 < 2e-16 ***
## sand_fraction 2.192e+01 1.223e+00 17.913 < 2e-16 ***
## silt_fraction 1.410e+01 2.113e+00 6.671 3.64e-11 ***
## bulkdensity_soil -3.708e+00 7.292e-01 -5.085 4.17e-07 ***
## ph_soil -3.084e-01 1.710e-01 -1.804 0.07149 .
## soilcarbon -4.043e-01 6.431e-02 -6.287 4.30e-10 ***
## time_since_dist 4.360e-04 5.782e-04 0.754 0.45098
## ScenarioSSP1-RCP2.6 -1.794e+00 2.126e-01 -8.439 < 2e-16 ***
## ScenarioSSP3-RCP7.0 -1.309e+00 2.409e-01 -5.431 6.60e-08 ***
## ScenarioSSP5-RCP8.5 -1.343e+00 2.590e-01 -5.184 2.49e-07 ***
## initial_recruitment_BNE 1.084e-02 2.404e-03 4.507 7.12e-06 ***
## initial_recruitment_IBS 2.791e-03 4.060e-03 0.687 0.49192
## initial_recruitment_TeBS 4.896e-02 1.832e-02 2.673 0.00761 **
## initial_recruitment_Tundra -9.699e-03 3.652e-03 -2.656 0.00801 **
## initial_recruitment_otherC -9.609e-03 4.702e-03 -2.044 0.04116 *
## recruitment_ten_years_BNE -3.912e-04 1.731e-04 -2.260 0.02395 *
## recruitment_ten_years_IBS 1.260e-03 6.809e-04 1.850 0.06455 .
## recruitment_ten_years_TeBS -6.401e-03 3.263e-03 -1.961 0.05003 .
## recruitment_ten_years_Tundra 3.391e-04 2.358e-04 1.438 0.15068
## recruitment_ten_years_otherC 6.490e-04 1.414e-03 0.459 0.64629
## previous_state_BNE 4.760e-03 2.538e-02 0.188 0.85125
## previous_state_IBS -7.137e-03 1.818e-02 -0.393 0.69465
## previous_state_TeBS 1.657e-01 1.141e-01 1.452 0.14666
## previous_state_Tundra 5.411e-03 1.529e-01 0.035 0.97179
## previous_state_otherC -1.832e-01 3.789e-02 -4.834 1.49e-06 ***
## PC1_temp 6.788e-02 3.771e-03 18.002 < 2e-16 ***
## PC1_precip -6.060e-04 3.770e-05 -16.073 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.632 on 1414 degrees of freedom
## Multiple R-squared: 0.6317, Adjusted R-squared: 0.6244
## F-statistic: 86.61 on 28 and 1414 DF, p-value: < 2.2e-16
##
##
## Response PC2 :
##
## Call:
## lm(formula = PC2 ~ Lon + Lat + sand_fraction + silt_fraction +
## bulkdensity_soil + ph_soil + soilcarbon + time_since_dist +
## Scenario + initial_recruitment_BNE + initial_recruitment_IBS +
## initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC +
## recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS +
## recruitment_ten_years_Tundra + recruitment_ten_years_otherC +
## previous_state_BNE + previous_state_IBS + previous_state_TeBS +
## previous_state_Tundra + previous_state_otherC + PC1_temp +
## PC1_precip, data = d_train %>% select(-c(clay_fraction)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.1278 -0.2266 0.2995 0.7167 8.0355
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.101e+00 1.093e+00 1.923 0.054667 .
## Lon -4.360e-05 4.711e-04 -0.093 0.926271
## Lat -3.390e-02 9.081e-03 -3.733 0.000197 ***
## sand_fraction -3.266e-01 6.814e-01 -0.479 0.631762
## silt_fraction -2.080e+00 1.177e+00 -1.767 0.077475 .
## bulkdensity_soil 3.048e-01 4.061e-01 0.750 0.453082
## ph_soil 3.194e-02 9.523e-02 0.335 0.737422
## soilcarbon 3.125e-02 3.582e-02 0.873 0.383077
## time_since_dist -2.938e-04 3.221e-04 -0.912 0.361716
## ScenarioSSP1-RCP2.6 4.812e-01 1.184e-01 4.064 5.08e-05 ***
## ScenarioSSP3-RCP7.0 7.720e-01 1.342e-01 5.753 1.07e-08 ***
## ScenarioSSP5-RCP8.5 7.206e-01 1.443e-01 4.995 6.63e-07 ***
## initial_recruitment_BNE 3.832e-04 1.339e-03 0.286 0.774810
## initial_recruitment_IBS -8.366e-03 2.261e-03 -3.699 0.000224 ***
## initial_recruitment_TeBS -1.302e-01 1.020e-02 -12.762 < 2e-16 ***
## initial_recruitment_Tundra 2.018e-03 2.034e-03 0.992 0.321272
## initial_recruitment_otherC -3.382e-03 2.619e-03 -1.291 0.196776
## recruitment_ten_years_BNE 7.787e-05 9.638e-05 0.808 0.419234
## recruitment_ten_years_IBS 1.375e-03 3.792e-04 3.625 0.000299 ***
## recruitment_ten_years_TeBS 1.821e-02 1.817e-03 10.018 < 2e-16 ***
## recruitment_ten_years_Tundra -2.333e-04 1.313e-04 -1.776 0.075869 .
## recruitment_ten_years_otherC 1.093e-04 7.875e-04 0.139 0.889650
## previous_state_BNE -3.092e-02 1.414e-02 -2.187 0.028886 *
## previous_state_IBS -3.948e-03 1.012e-02 -0.390 0.696571
## previous_state_TeBS -3.350e-01 6.354e-02 -5.272 1.56e-07 ***
## previous_state_Tundra -2.054e-01 8.518e-02 -2.412 0.016003 *
## previous_state_otherC -9.795e-02 2.110e-02 -4.642 3.78e-06 ***
## PC1_temp 2.264e-02 2.100e-03 10.781 < 2e-16 ***
## PC1_precip 3.523e-05 2.100e-05 1.678 0.093629 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 1414 degrees of freedom
## Multiple R-squared: 0.3486, Adjusted R-squared: 0.3357
## F-statistic: 27.03 on 28 and 1414 DF, p-value: < 2.2e-16
Figure 1 shows the residuals and Q-Q plots from last week:
Figure 2 shows the true PC scores vs. the predicted PC scores for the test set.
Figure 3 shows for four randomly picked disturbed grid cells, one for each scenario, the true functional fit and the predicted one from the model.
As a first step to improve the model and to address the structure in the residuals, several configurations of covariates are tested, here as linear effects. None of these attempts improved the model substantially. It became clear that no matter which variables are in the model, the structure in the residuals is present. This leads to the conclusion that not one (or multiple) covariates are responsible, but rather the target variable or the model itself.
A common technique in variable selection is a stepwise approach based on AIC. In particular this means that for each considered variable configuration, the AIC is derived by
where \(n\) is the number of observations, \(k\) is the number of parameters in the model, and \(r\) is the number of response variables. With that, adding and removing covariates is assessed in a stepwise manner.
The resulting model improves the AIC only marginally (from 12103 to
12099) and removes variable recruitment_ten_years_otherC.
Removing this covariate only is not an option though, since eather all
five recruitment_ten_years variables should be included, or
none of the five. In addition, removing this covariate does not account
for the structured residuals.
Note that in this approach, only linear effects and no interactions are considered.
Several numbers of PCs were checked with no change to the structural residuals. Therefore, the model outputs and corresponding plots are not reported here.
Several combinations of smooth effects were tested. For variables
Lon and PC1_temp it might make sense to
include additive effects. However, this does not change the structural
behavior. The output of the model is given in the following:
##
## Family: Multivariate normal
## Link function:
##
## Formula:
## PC1 ~ s(Lon) + Lat + sand_fraction + silt_fraction + bulkdensity_soil +
## ph_soil + soilcarbon + Scenario + time_since_dist + initial_recruitment_BNE +
## initial_recruitment_IBS + initial_recruitment_otherC + initial_recruitment_TeBS +
## initial_recruitment_Tundra + recruitment_ten_years_BNE +
## recruitment_ten_years_IBS + recruitment_ten_years_otherC +
## recruitment_ten_years_TeBS + recruitment_ten_years_Tundra +
## previous_state_BNE + previous_state_IBS + previous_state_otherC +
## previous_state_TeBS + previous_state_Tundra + s(PC1_temp) +
## PC1_precip + PC1_temp:PC1_precip
## PC2 ~ s(Lon) + Lat + sand_fraction + silt_fraction + bulkdensity_soil +
## ph_soil + soilcarbon + Scenario + time_since_dist + initial_recruitment_BNE +
## initial_recruitment_IBS + initial_recruitment_otherC + initial_recruitment_TeBS +
## initial_recruitment_Tundra + recruitment_ten_years_BNE +
## recruitment_ten_years_IBS + recruitment_ten_years_otherC +
## recruitment_ten_years_TeBS + recruitment_ten_years_Tundra +
## previous_state_BNE + previous_state_IBS + previous_state_otherC +
## previous_state_TeBS + previous_state_Tundra + s(PC1_temp) +
## PC1_precip + PC1_temp:PC1_precip
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.151e+01 2.746e+00 -4.192 2.76e-05 ***
## Lat 2.687e-03 3.990e-02 0.067 0.946309
## sand_fraction 2.069e+01 1.098e+00 18.842 < 2e-16 ***
## silt_fraction 1.196e+01 1.879e+00 6.367 1.92e-10 ***
## bulkdensity_soil -2.380e+00 6.883e-01 -3.458 0.000545 ***
## ph_soil 9.795e-02 1.655e-01 0.592 0.553973
## soilcarbon -2.096e-01 6.191e-02 -3.385 0.000711 ***
## ScenarioSSP1-RCP2.6 -8.654e-01 2.135e-01 -4.054 5.04e-05 ***
## ScenarioSSP3-RCP7.0 5.844e-01 2.879e-01 2.030 0.042343 *
## ScenarioSSP5-RCP8.5 8.045e-01 3.321e-01 2.422 0.015422 *
## time_since_dist 3.895e-04 5.063e-04 0.769 0.441749
## initial_recruitment_BNE 7.321e-03 2.122e-03 3.449 0.000562 ***
## initial_recruitment_IBS 2.174e-03 3.569e-03 0.609 0.542400
## initial_recruitment_otherC -1.128e-02 4.112e-03 -2.743 0.006095 **
## initial_recruitment_TeBS 1.401e-02 1.656e-02 0.846 0.397617
## initial_recruitment_Tundra -4.262e-03 3.196e-03 -1.333 0.182393
## recruitment_ten_years_BNE -1.484e-04 1.526e-04 -0.972 0.330995
## recruitment_ten_years_IBS 8.387e-04 5.975e-04 1.404 0.160386
## recruitment_ten_years_otherC 2.548e-04 1.236e-03 0.206 0.836699
## recruitment_ten_years_TeBS -2.781e-03 2.890e-03 -0.962 0.335859
## recruitment_ten_years_Tundra -8.183e-05 2.067e-04 -0.396 0.692206
## previous_state_BNE 1.224e-02 2.225e-02 0.550 0.582352
## previous_state_IBS 1.856e-02 1.608e-02 1.154 0.248584
## previous_state_otherC -1.357e-01 3.382e-02 -4.011 6.04e-05 ***
## previous_state_TeBS 1.339e-01 9.962e-02 1.344 0.179056
## previous_state_Tundra 1.151e-01 1.340e-01 0.859 0.390148
## PC1_precip -4.042e-04 4.698e-05 -8.604 < 2e-16 ***
## PC1_precip:PC1_temp -8.750e-07 1.013e-06 -0.864 0.387606
## (Intercept).1 -6.782e-02 1.420e+00 -0.048 0.961921
## Lat.1 -1.988e-02 2.065e-02 -0.962 0.335876
## sand_fraction.1 4.832e-01 5.663e-01 0.853 0.393497
## silt_fraction.1 -1.794e+00 9.690e-01 -1.851 0.064176 .
## bulkdensity_soil.1 2.117e-01 3.550e-01 0.596 0.550990
## ph_soil.1 2.077e-01 8.546e-02 2.430 0.015095 *
## soilcarbon.1 2.437e-02 3.192e-02 0.763 0.445323
## ScenarioSSP1-RCP2.6.1 5.233e-02 1.101e-01 0.475 0.634550
## ScenarioSSP3-RCP7.0.1 4.294e-01 1.486e-01 2.890 0.003858 **
## ScenarioSSP5-RCP8.5.1 7.131e-01 1.716e-01 4.156 3.24e-05 ***
## time_since_dist.1 -1.100e-04 2.613e-04 -0.421 0.673894
## initial_recruitment_BNE.1 8.560e-05 1.095e-03 0.078 0.937700
## initial_recruitment_IBS.1 -4.230e-03 1.841e-03 -2.298 0.021543 *
## initial_recruitment_otherC.1 -2.440e-03 2.122e-03 -1.150 0.250177
## initial_recruitment_TeBS.1 -7.602e-02 8.697e-03 -8.742 < 2e-16 ***
## initial_recruitment_Tundra.1 2.219e-04 1.648e-03 0.135 0.892884
## recruitment_ten_years_BNE.1 3.091e-05 7.871e-05 0.393 0.694588
## recruitment_ten_years_IBS.1 9.654e-04 3.082e-04 3.133 0.001732 **
## recruitment_ten_years_otherC.1 4.427e-04 6.379e-04 0.694 0.487715
## recruitment_ten_years_TeBS.1 1.178e-02 1.503e-03 7.842 4.45e-15 ***
## recruitment_ten_years_Tundra.1 -1.554e-04 1.066e-04 -1.457 0.145017
## previous_state_BNE.1 -6.702e-03 1.148e-02 -0.584 0.559482
## previous_state_IBS.1 -2.271e-03 8.294e-03 -0.274 0.784274
## previous_state_otherC.1 -7.876e-02 1.746e-02 -4.511 6.45e-06 ***
## previous_state_TeBS.1 -2.402e-01 5.137e-02 -4.676 2.92e-06 ***
## previous_state_Tundra.1 -1.582e-01 6.915e-02 -2.288 0.022112 *
## PC1_precip.1 2.078e-05 2.432e-05 0.854 0.392853
## PC1_precip:PC1_temp.1 3.037e-06 5.249e-07 5.786 7.21e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(Lon) 8.236 8.843 429.48 <2e-16 ***
## s(PC1_temp) 5.430 6.642 383.44 <2e-16 ***
## s.1(Lon) 8.264 8.854 69.54 <2e-16 ***
## s.1(PC1_temp) 7.729 8.610 609.11 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Deviance explained = 66%
## -REML = 3193.5 Scale est. = 1 n = 1443
Lon and PC1_temp for the
additive model (on train data set).Figure 11 shows the residuals and Q-Q plots. Again, there is no major difference to the residuals and Q-Q plots from Figure 1.
The two plots indicating the goodness of prediction are leaved out here.
To improve the general model fit, several interaction terms are evaluated:
Lon - Lat: small effects, significant for
both PCssand_fraction - silt_fraction: very small
effects, significant for PC 1 onlyPC1_temp - PC1_precip: medium effects,
significant for PC 2 onlyThese interactions result from ‘logical’ considerations, but should be double-checked with Lucia.
The model output is given by:
## Response PC1 :
##
## Call:
## lm(formula = PC1 ~ Lon + Lat + sand_fraction + silt_fraction +
## bulkdensity_soil + ph_soil + soilcarbon + time_since_dist +
## Scenario + initial_recruitment_BNE + initial_recruitment_IBS +
## initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC +
## recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS +
## recruitment_ten_years_Tundra + recruitment_ten_years_otherC +
## previous_state_BNE + previous_state_IBS + previous_state_TeBS +
## previous_state_Tundra + previous_state_otherC + PC1_temp +
## PC1_precip + Lon:Lat + sand_fraction:silt_fraction + PC1_temp:PC1_precip,
## data = d_train %>% dplyr::select(-c(clay_fraction)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9431 -1.9021 -0.1249 1.9098 6.6050
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.910e+01 2.438e+00 -11.936 < 2e-16 ***
## Lon 6.728e-02 1.167e-02 5.767 9.91e-09 ***
## Lat 2.783e-01 1.659e-02 16.780 < 2e-16 ***
## sand_fraction 2.978e+01 2.731e+00 10.907 < 2e-16 ***
## silt_fraction 2.541e+01 4.341e+00 5.853 5.98e-09 ***
## bulkdensity_soil -2.541e+00 7.446e-01 -3.412 0.000662 ***
## ph_soil -2.545e-01 1.695e-01 -1.502 0.133424
## soilcarbon -2.660e-01 6.683e-02 -3.980 7.24e-05 ***
## time_since_dist 3.561e-04 5.684e-04 0.627 0.531034
## ScenarioSSP1-RCP2.6 -2.040e+00 2.136e-01 -9.550 < 2e-16 ***
## ScenarioSSP3-RCP7.0 -1.695e+00 2.439e-01 -6.951 5.54e-12 ***
## ScenarioSSP5-RCP8.5 -1.823e+00 2.644e-01 -6.893 8.23e-12 ***
## initial_recruitment_BNE 9.675e-03 2.368e-03 4.086 4.63e-05 ***
## initial_recruitment_IBS 4.236e-03 3.994e-03 1.060 0.289101
## initial_recruitment_TeBS 4.443e-02 1.806e-02 2.460 0.014011 *
## initial_recruitment_Tundra -8.874e-03 3.589e-03 -2.473 0.013532 *
## initial_recruitment_otherC -8.847e-03 4.622e-03 -1.914 0.055813 .
## recruitment_ten_years_BNE -3.474e-04 1.701e-04 -2.042 0.041314 *
## recruitment_ten_years_IBS 1.037e-03 6.697e-04 1.548 0.121896
## recruitment_ten_years_TeBS -6.037e-03 3.210e-03 -1.881 0.060239 .
## recruitment_ten_years_Tundra 3.275e-04 2.318e-04 1.413 0.157839
## recruitment_ten_years_otherC 5.109e-04 1.389e-03 0.368 0.713060
## previous_state_BNE 1.338e-03 2.499e-02 0.054 0.957293
## previous_state_IBS -6.364e-03 1.787e-02 -0.356 0.721821
## previous_state_TeBS 1.690e-01 1.122e-01 1.506 0.132179
## previous_state_Tundra 6.545e-03 1.502e-01 0.044 0.965260
## previous_state_otherC -1.781e-01 3.744e-02 -4.756 2.18e-06 ***
## PC1_temp 6.345e-02 3.756e-03 16.896 < 2e-16 ***
## PC1_precip -7.359e-04 4.344e-05 -16.942 < 2e-16 ***
## Lon:Lat -1.289e-03 2.016e-04 -6.396 2.16e-10 ***
## sand_fraction:silt_fraction -2.254e+01 6.880e+00 -3.276 0.001080 **
## PC1_temp:PC1_precip -1.508e-06 9.693e-07 -1.556 0.120048
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.585 on 1411 degrees of freedom
## Multiple R-squared: 0.6454, Adjusted R-squared: 0.6377
## F-statistic: 82.86 on 31 and 1411 DF, p-value: < 2.2e-16
##
##
## Response PC2 :
##
## Call:
## lm(formula = PC2 ~ Lon + Lat + sand_fraction + silt_fraction +
## bulkdensity_soil + ph_soil + soilcarbon + time_since_dist +
## Scenario + initial_recruitment_BNE + initial_recruitment_IBS +
## initial_recruitment_TeBS + initial_recruitment_Tundra + initial_recruitment_otherC +
## recruitment_ten_years_BNE + recruitment_ten_years_IBS + recruitment_ten_years_TeBS +
## recruitment_ten_years_Tundra + recruitment_ten_years_otherC +
## previous_state_BNE + previous_state_IBS + previous_state_TeBS +
## previous_state_Tundra + previous_state_otherC + PC1_temp +
## PC1_precip + Lon:Lat + sand_fraction:silt_fraction + PC1_temp:PC1_precip,
## data = d_train %>% dplyr::select(-c(clay_fraction)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0715 -0.2195 0.2604 0.6879 7.6522
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.161e+00 1.365e+00 0.851 0.394849
## Lon -1.603e-02 6.530e-03 -2.454 0.014240 *
## Lat -3.921e-02 9.284e-03 -4.223 2.56e-05 ***
## sand_fraction 2.507e+00 1.528e+00 1.641 0.101114
## silt_fraction 2.476e+00 2.430e+00 1.019 0.308373
## bulkdensity_soil -2.159e-01 4.167e-01 -0.518 0.604432
## ph_soil 1.010e-01 9.486e-02 1.064 0.287364
## soilcarbon -1.830e-02 3.740e-02 -0.489 0.624741
## time_since_dist -3.025e-04 3.181e-04 -0.951 0.341881
## ScenarioSSP1-RCP2.6 4.042e-01 1.196e-01 3.380 0.000744 ***
## ScenarioSSP3-RCP7.0 7.273e-01 1.365e-01 5.327 1.16e-07 ***
## ScenarioSSP5-RCP8.5 6.767e-01 1.480e-01 4.572 5.26e-06 ***
## initial_recruitment_BNE 2.814e-04 1.325e-03 0.212 0.831830
## initial_recruitment_IBS -8.733e-03 2.236e-03 -3.906 9.82e-05 ***
## initial_recruitment_TeBS -1.248e-01 1.011e-02 -12.343 < 2e-16 ***
## initial_recruitment_Tundra 2.025e-03 2.009e-03 1.008 0.313722
## initial_recruitment_otherC -2.934e-03 2.587e-03 -1.134 0.256870
## recruitment_ten_years_BNE 8.613e-05 9.521e-05 0.905 0.365826
## recruitment_ten_years_IBS 1.454e-03 3.749e-04 3.880 0.000109 ***
## recruitment_ten_years_TeBS 1.761e-02 1.797e-03 9.799 < 2e-16 ***
## recruitment_ten_years_Tundra -2.563e-04 1.297e-04 -1.975 0.048407 *
## recruitment_ten_years_otherC 1.558e-04 7.774e-04 0.200 0.841174
## previous_state_BNE -2.542e-02 1.398e-02 -1.818 0.069332 .
## previous_state_IBS -3.949e-03 1.000e-02 -0.395 0.693047
## previous_state_TeBS -3.264e-01 6.279e-02 -5.198 2.31e-07 ***
## previous_state_Tundra -2.139e-01 8.409e-02 -2.543 0.011094 *
## previous_state_otherC -8.429e-02 2.096e-02 -4.022 6.07e-05 ***
## PC1_temp 2.366e-02 2.102e-03 11.257 < 2e-16 ***
## PC1_precip 1.424e-05 2.431e-05 0.586 0.558165
## Lon:Lat 2.647e-04 1.128e-04 2.346 0.019114 *
## sand_fraction:silt_fraction -7.178e+00 3.851e+00 -1.864 0.062531 .
## PC1_temp:PC1_precip -2.693e-06 5.425e-07 -4.964 7.75e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.447 on 1411 degrees of freedom
## Multiple R-squared: 0.3668, Adjusted R-squared: 0.3528
## F-statistic: 26.36 on 31 and 1411 DF, p-value: < 2.2e-16
Since the residuals and the prediction accuracy hardly changes, these plots are left out here.
In order to assess whether considering the PFTs separately might mitigate the structure, the initial model is fitted to FPCA scores using one PFT at a time. The following figures show the residuals for each PFT.
Interestingly, the general residual shape in Figure 1 resembles strongly to that of both Needleleaf evergreen and Pioneering broadleaf in Figures 13 and 14, respectively. However, the residuals are bounded for all five PFTs, but they differ strongly in the shape of the structure.
One possible explanation for the structured residuals is that the model is not compatible with the MFPCA scores. In order to investigate this further, the model is fit to the original data. Clearly, this is not possible for the whole data set, but the model can be fed with shares of specific time points and specific PFTs. Since the analysis above indicated a strong influence of Needleleaf evergreen (BNE), in the following, this PFT is considered for different time points. Figure 18 shows the residuals for five different time points, namely after 20, 40, 60, 80 and 100 years of recovery.
Apparently, the longer the recovery period, the more this structure becomes apparent. So the thought of MFPCA scores are not compatible with the model is not confirmed.
This leads to the question: If it is not the target data, nor non-linear or interaction effects, and not because of the FPCA scores for temperature and precipitation - what else could be the reason for the structured residuals?
Conclusion
After the attempts above and even more non-shown here to mitigate the structure in the residuals, the question, why the residuals show this structure, remains. However, I can think of two possible reasons:
The following table shows AIC values for all presented approaches. Clearly, the additive model wins in terms of lowest values, while the model fitted to the absolute `cmass’ values has twice the AIC values.
| Initial model | Stepwise selection | Softmax | Absolute cmass | Additive model | Interactions | |
|---|---|---|---|---|---|---|
| AIC | 12103 | 12099 | 10251 | 24927 | 8382 | 12020 |
Here, each covariate is plotted against the residuals for the first model: